| SequenceLength | OverlapLength | MisMatches | |
|---|---|---|---|
| Min. : 14 | Min. : 0.0 | Min. : 0.00 | |
| 1st Qu.:252 | 1st Qu.:102.0 | 1st Qu.: 0.00 | |
| Median :252 | Median :153.0 | Median : 0.00 | |
| Mean :241 | Mean :137.7 | Mean : 1.38 | |
| 3rd Qu.:253 | 3rd Qu.:173.0 | 3rd Qu.: 0.00 | |
| Max. :500 | Max. :250.0 | Max. :116.00 |
| QueryLength | AlignmentLength | PercentIdentity | |
|---|---|---|---|
| Min. :100.0 | Min. :-1595.0 | Min. : 50.00 | |
| 1st Qu.:252.0 | 1st Qu.: 252.0 | 1st Qu.: 89.68 | |
| Median :253.0 | Median : 253.0 | Median : 92.09 | |
| Mean :246.4 | Mean : 246.5 | Mean : 91.30 | |
| 3rd Qu.:253.0 | 3rd Qu.: 253.0 | 3rd Qu.: 94.86 | |
| Max. :300.0 | Max. : 303.0 | Max. :100.00 |
Subplots of assembled (blue) and aligned (green) sequences are grouped together
| SampleID | Original | Screened | Aligned | Denoised | NonChimeric | BacteriaOnly | NoMock |
|---|---|---|---|---|---|---|---|
| F3D000 | 7786 | 6836 | 6813 | 6810 | 6374 | 6369 | 6369 |
| F3D001 | 5862 | 5026 | 5009 | 5009 | 4712 | 4705 | 4705 |
| F3D002 | 19610 | 17355 | 17277 | 15105 | 13918 | 13852 | 13852 |
| F3D003 | 6756 | 5955 | 5920 | 11068 | 10057 | 10055 | 10055 |
| F3D005 | 4444 | 3861 | 3844 | 14846 | 13869 | 13831 | 13831 |
| F3D006 | 7985 | 7048 | 7014 | 5224 | 4784 | 4784 | 4784 |
| F3D007 | 5124 | 4538 | 4510 | 2773 | 2521 | 2516 | 2516 |
| F3D008 | 5292 | 4639 | 4611 | 2763 | 2472 | 2472 | 2472 |
| F3D009 | 7065 | 6247 | 6208 | 4048 | 3596 | 3596 | 3596 |
| F3D011 | 17774 | 15226 | 15127 | 6353 | 5734 | 5732 | 5732 |
Shows summary of the exact number of sequences remaining at each step in 4-quantiles.
Original Screened Aligned Denoised
Min. : 14 Min. : 8 Min. : 6 Min. : 6
1st Qu.: 5365 1st Qu.: 4688 1st Qu.: 4658 1st Qu.: 4656
Median : 7996 Median : 7046 Median : 7012 Median : 7008
Mean :10088 Mean : 8811 Mean : 8767 Mean : 8762
3rd Qu.:13630 3rd Qu.:11928 3rd Qu.:11870 3rd Qu.:11863
Max. :40077 Max. :34820 Max. :34589 Max. :34566
NonChimeric BacteriaOnly NoMock
Min. : 6 Min. : 6 Min. : 6
1st Qu.: 4331 1st Qu.: 4322 1st Qu.: 4322
Median : 6580 Median : 6568 Median : 6568
Mean : 8114 Mean : 8108 Mean : 8108
3rd Qu.:10737 3rd Qu.:10731 3rd Qu.:10731
Max. :32147 Max. :32142 Max. :32142
# A tibble: 10 x 7
Original Screened Aligned Denoised NonChimeric BacteriaOnly NoMock
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 NA NA NA NA NA NA NA
2 252 252 252 252 252 252 252
3 252 252 252 252 252 252 252
4 253 253 253 252 252 252 252
5 253 155 252 252 252 252 252
6 252 252 253 252 252 252 252
7 124 252 252 253 253 253 253
8 252 253 252 252 252 252 252
9 253 252 253 253 253 253 253
10 253 252 211 253 253 253 253
Note that it for large number of samples it is difficult to plot the x-axis. In such situation it is good to split the samples (see example below)
Below is a command for ssampling a specific dataset size. Here it shows how to filter samples with less that 2000 sequences.
library(dplyr)
subsetlt2000 <- seqcount.v.m %>% as.data.frame() %>% dplyr::filter(value <2000)
Shows maximum sequence depth.
R version 3.5.2 (2018-12-20)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.4
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] scales_1.0.0 ggpubr_0.2 magrittr_1.5 dplyr_0.8.0.1 ggplot2_3.1.0
[6] readr_1.3.1
loaded via a namespace (and not attached):
[1] Biobase_2.42.0 tidyr_0.8.3 jsonlite_1.6
[4] splines_3.5.2 foreach_1.4.4 assertthat_0.2.1
[7] highr_0.8 stats4_3.5.2 phyloseq_1.26.1
[10] yaml_2.2.0 slam_0.1-45 pillar_1.3.1
[13] lattice_0.20-38 glue_1.3.1 digest_0.6.18
[16] XVector_0.22.0 colorspace_1.4-1 cowplot_0.9.4
[19] htmltools_0.3.6 Matrix_1.2-15 plyr_1.8.4
[22] tm_0.7-6 pkgconfig_2.0.2 microbiome_1.4.2
[25] zlibbioc_1.28.0 purrr_0.3.2 tibble_2.1.1
[28] mgcv_1.8-27 IRanges_2.16.0 withr_2.1.2
[31] BiocGenerics_0.28.0 lazyeval_0.2.1 cli_1.1.0
[34] NLP_0.2-0 survival_2.43-3 crayon_1.3.4
[37] evaluate_0.13 fansi_0.4.0 nlme_3.1-137
[40] MASS_7.3-51.1 xml2_1.2.0 vegan_2.5-4
[43] tools_3.5.2 data.table_1.12.0 hms_0.4.2
[46] stringr_1.4.0 Rhdf5lib_1.4.3 S4Vectors_0.20.1
[49] munsell_0.5.0 cluster_2.0.7-1 Biostrings_2.50.2
[52] ade4_1.7-13 compiler_3.5.2 rlang_0.3.4
[55] rhdf5_2.26.2 grid_3.5.2 iterators_1.0.10
[58] biomformat_1.10.1 igraph_1.2.4 labeling_0.3
[61] rmarkdown_1.12 gtable_0.2.0 codetools_0.2-16
[64] multtest_2.38.0 reshape2_1.4.3 iNEXT_2.0.19
[67] R6_2.4.0 knitr_1.22 utf8_1.1.4
[70] permute_0.9-4 ape_5.2 stringi_1.4.3
[73] parallel_3.5.2 Rcpp_1.0.1 tidyselect_0.2.5
[76] xfun_0.6